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ABSTRACT 


Store  Information  in  DNA 


We  have  designed  and  performed  a  proof- 
of-principle  experiment  demonstrates  that  huge 
amount  of  non-biological,  or  abiotic,  information  can 
be  stored  in  a  memory  composed  of  DNA  molecules. 
The  preliminary  experiment  emphasizes  on  achieving 
a  practical  design  motivates  several  fundamental 
questions,  such  as  the  amount  of  information  that  can 
be  stored  in  a  DNA  memory  before  errors  are 
introduced,  and  practical  and  cost-effective  ways  of 
mapping  abiotic  data  onto  DNA  sequences. 

1.  INTRODUCTION 

The  DNA  memory  implementation  consists 
of  two  major  stages  that  progressively  add  capability. 
The  first  stage  is  called  a  Read/Write  (R/W)  DNA 
memory,  in  which  practical  storage  and  accurate 
retrieval  of  information  are  demonstrated.  Here,  with 
a  library  of  800  words  we  created  and  store  a 
database  of  customer  information  with  DNA.  We 
also  performed  a  search  and  retrieve  operation  with 
the  DNA  database.  Finally,  we  showed  that  with  a 
divide-and  conquer  method  we  can  read  out  the 
retrieved  information  with  a  DNA  microarray 
containing  the  800  words  library. 

1.1  Store  Abiotic  Information  in  DNA 
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As  illustrated  in  Figure  1,  we  first 
constructed  a  word  library  with  random  40-mer  DNA 
sequences  by  cloning  the  PCR  product  of  this  library 
into  plasmids.  After  transforming  it  into  E.  coli  host 
cell,  we  generate  a  collection  of  E.  coli  strains  each 
containing  an  unique  40-bp  sequence  that  can  be 
amplified  and  isolated  by  cutting  it  out  with  two  built  in 
restriction  enzyme  sites  flanking  the  40-mer  sequences. 
Next,  we  assigned  one  word  for  each  unique  40-mer 
sequence  (less  than  50,000  words  are  needed  for  practical 
communication  in  English).  Then,  we  connected  the  words 
to  a  sentence  or  entry  of  short  information  independently 
with  DNA  ligase.  The  sum  of  all  the  sentences  or  entries  is 
our  database;  small  samples  shown  at  the  bottom  of  Figure 
1. 
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Figure  1 

1.2  Retrieve  and  Read-out  of  the  Stored  Information 

To  retrieve  and  read  the  stored  information,  we 
first  pass  the  DNA  database  through  an  ssDNA  column 
with  the  sequence  of  the  word  of  interest.  The  retrieved 
DNA  (sentences  or  entries  by  Watson-Crick  hybridization) 
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can  be  used  as  the  probe  on  a  DNA  microarray,  in  which 
each  spots  represents  one  word.  The  contents  of  the 
retrieved  sentences  can  be  read  out  from  the  DNA 
microarray  reference.  The  retrieved  DNA  can  be  further 
separated  according  to  another  word  by  passing  through 
another  column  until  a  single  sentence  or  entry  can  be  read 
out  from  the  DNA  microarray  (see  Figure  2). 
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Figure  2 


There  are  4  20  (~1012 )  DNA  molecules  of  length  20  base 
pairs  (bp).  Thus,  the  content  of  the  OED  is  small  fraction  of 
the  total  number  of  20-mers,  and  could  be  easily  stored  in  a 
DNA  memory  with  an  appropriate  word  to  sequence 
mapping.  Moreover,  DNA  is  a  molecule  on  the  nanoscale 
that  can  be  manipulated  in  the  test  tube  to  read  and  write 
data  with  well-characterized  molecular  biology  protocols  as 
we  have  demonstrated  here.  DNA  can  be  attached  to 
other  nanomaterials,  such  as  carbon  nanotubes  (Williams 
et  al.,  2001)  and  gold  nanoparticles  (Mirkin  et  al.,  1996), 
and  has  been  used  to  assemble  nanostructures  from  those 
materials  (Mirkin  et  al.,  1996).  Small  amounts  of  DNA 
can  store  vast  amounts  of  information.  The  operations  on 
DNA  molecules  occur  in  parallel,  which  produces 
substantial  speed-ups  for  very  large  data  sets.  Moreover, 
DNA  computing  is  capable  of  universal  computation 
(Paun,  1996),  and  computations  could  be  done  in  vitro 
on  the  information  stored  in  the  memory. 
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2.  DISCUSSION 

2.1  Why  DNA? 

Advances  in  information  technology  have 
produced  vast  amounts  of  data.  A  study  by  the  University 
of  California  at  Berkeley  in  2000  (Lyman  et  al.,  2000) 
estimates  that  1  to  2  exabytes  ( 10ls  bytes)  of  information 
are  produced  worldwide,  each  year.  In  addition,  the  amount 
of  information  that  is  published  in  periodicals  and  books 
doubles  every  few  years  (Wurman,  1989),  and  the  Internet 
now  has  search  engines  with  3  billion  URLs  (Google, 
2003).  As  the  amount  of  data  continues  to  increase,  the 
ability  to  store  it  and  search  for  relevant  information  is 
diminished.  DNA  has  many  attractive  properties  as  a 
storage  medium  outside  of  the  cell.  A  gram  of  DNA  (can 
be  easily  resolved  in  a  volume  of  one  drop  of  water) 
potentially  can  store  ~  10  21  bits  of  information. 

To  take  an  example,  there  are  approximately 
500,000  words  in  the  Oxford  English  Dictionary  (OED). 


CONCLUSION 

With  a  huge  capacity,  and  massively  parallel 
search  abilities,  a  DNA  memory  for  abiotic  data  is  a 
potentially  revolutionary  way  of  storing  and  processing 
vast  amounts  of  information.  In  the  future,  large  scale  data 
storage  and  a  limited  intelligence  will  be  added  to  the  R/W 
DNA  memory  in  the  form  of  context-based  search.  Thus,  it 
should  be  termed  the  Intelligent  DNA  memory.  For 
example,  the  entire  database  of  customer  information  from 
Acxiom  Corporation  (a  database  service  company)  can  be 
stored,  searched,  and  processed  in  DNA  form.  The  Acxiom 
database  is  approximately  6  Petabytes  (~1015  bytes  that  is 
stored  in  a  warehouse  of  tape  and  disk  farm  half  the  size  of 
a  football  field)  can  be  stored  in  a  volume  less  than  a  drop 
of  water.  For  the  Acxiom  application,  it  is  not  enough  to 
store  and  recall  information,  but  because  of  imprecision  in 
the  customer  data,  such  as  misspellings,  abbreviations,  and 
synonymous  terms,  searching  and  retrieving  data  based 
upon  both  content  and  context  is  advantageous. 


